Effects of Response Frequency Constraints on Learning in a Non-Stationary Multi-armed Bandit Task
نویسندگان
چکیده
منابع مشابه
Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T . To do this, the gambler needs to acquire information about arms (ex...
متن کاملReinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number ...
متن کاملMULTI–ARMED BANDIT FOR PRICING Multi–Armed Bandit for Pricing
This paper is about the study of Multi–Armed Bandit (MAB) approaches for pricing applications, where a seller needs to identify the selling price for a particular kind of item that maximizes her/his profit without knowing the buyer demand. We propose modifications to the popular Upper Confidence Bound (UCB) bandit algorithm exploiting two peculiarities of pricing applications: 1) as the selling...
متن کاملOptimal Exploration-Exploitation in a Multi-Armed-Bandit Problem with Non-stationary Rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T . To do this, the gambler needs to acquire information about arms (ex...
متن کاملPure Exploration of Multi-armed Bandit Under Matroid Constraints
We study the pure exploration problem subject to a matroid constraint (Best-Basis) in a stochastic multi-armed bandit game. In a Best-Basis instance, we are given n stochastic arms with unknown reward distributions, as well as a matroid M over the arms. Let the weight of an arm be the mean of its reward distribution. Our goal is to identify a basis of M with the maximum total weight, using as f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Comparative Psychology
سال: 2014
ISSN: 0889-3667,2168-3344
DOI: 10.46867/ijcp.2014.27.02.07